Who Wrote This Code? Identifying the Authors of Program Binaries
نویسندگان
چکیده
Program authorship attribution—identifying a programmer based on stylistic characteristics of code—has practical implications for detecting software theft, digital forensics, and malware analysis. Authorship attribution is challenging in these domains where usually only binary code is available; existing source code-based approaches to attribution have left unclear whether and to what extent programmer style survives the compilation process. Casting authorship attribution as a machine learning problem, we present a novel program representation and techniques that automatically detect the stylistic features of binary code. We apply these techniques to two attribution problems: identifying the precise author of a program, and finding stylistic similarities between programs by unknown authors. Our experiments provide strong evidence that programmer style is preserved in program binaries.
منابع مشابه
Poster: Atoms of Style: Identifying the Authors of Program Binaries
Being able to identify the author of a program has many applications in both academic and commercial environments. In most use cases, the source code is readily available, and this is reflected in the literature, as previous work has mostly focused on source code analyses. In contrast, scant research has been carried out on identifying the authors of executable program binaries. This would be m...
متن کاملMachine Learning-Assisted Binary Code Analysis
Binary code analysis is a foundational technique in the areas of computer security, performance modeling, and program instrumentation. In computer security, such analysis can provide the basis for detecting, understanding and controlling malicious code. Any analysis of malicious program requires as a first step precisely locating the Function Entry Points (FEPs, the starting byte of each functi...
متن کاملWhen Coding Style Survives Compilation: De-anonymizing Programmers from Executable Binaries
The ability to identify authors of computer programs based on their coding style is a direct threat to the privacy and anonymity of programmers. Previous work has examined attribution of authors from both source code and compiled binaries, and found that while source code can be attributed with very high accuracy, the attribution of executable binary appears to be much more difficult. Many pote...
متن کاملBYTEWEIGHT: Learning to Recognize Functions in Binary Code
Function identification is a fundamental challenge in reverse engineering and binary program analysis. For instance, binary rewriting and control flow integrity rely on accurate function detection and identification in binaries. Although many binary program analyses assume functions can be identified a priori, identifying functions in stripped binaries remains a challenge. In this paper, we pro...
متن کاملJacobite Explanation of the Trinity in the Context of Muʿtazilite Theology: Abu Raʾitah al-Takriti
The Melkites, Jacobites, and Nestorians were the main Christian communities under Muslim rule. Several pre-Islamic Arab Christian authors wrote treatises concerning their beliefs in Arabic, some of which date back to the early Islamic centuries. The multiplicity of such polemical works suggests an intellectually open society and a degree of tolerance shown by Muslim leaders. Abu Raʾita...
متن کامل